Quicksort, Largest Bucket, and Min-Wise Hashing with Limited Independence

نویسندگان

  • Mathias Bæk Tejs Knudsen
  • Morten Stöckel
چکیده

Randomized algorithms and data structures are often analyzed under the assumption of access to a perfect source of randomness. The most fundamental metric used to measure how “random” a hash function or a random number generator is, is its independence: a sequence of random variables is said to be k-independent if every variable is uniform and every size k subset is independent. In this paper we consider three classic algorithms under limited independence. Besides the theoretical interest in removing the unrealistic assumption of full independence, the work is motivated by lower independence being more practical. We provide new bounds for randomized quicksort, min-wise hashing and largest bucket size under limited independence. Our results can be summarized as follows. – Randomized quicksort. When pivot elements are computed using a 5independent hash function, Karloff and Raghavan, J.ACM’93 showed O(n logn) expected worst-case running time for a special version of quicksort. We improve upon this, showing that the same running time is achieved with only 4-independence. – Min-wise hashing. For a set A, consider the probability of a particular element being mapped to the smallest hash value. It is known that 5-independence implies the optimal probability O(1/n). Broder et al., STOC’98 showed that 2-independence implies it isO(1/ √ |A|). We show a matching lower bound as well as new tight bounds for 3and 4-independent hash functions. – Largest bucket. We consider the case where n balls are distributed to n buckets using a k -independent hash function and analyze the largest bucket size. Alon et. al, STOC’97 showed that there exists a 2-independent hash function implying a bucket of size Ω(n). We generalize the bound, providing a k-independent family of functions that imply size Ω(n). ? Research partly supported by Mikkel Thorup’s Advanced Grant from the Danish Council for Independent Research under the Sapere Aude programme and the FNU project AlgoDisc Discrete Mathematics, Algorithms, and Data Structures. ?? This author is supported by the Danish National Research Foundation under the Sapere Aude program. ar X iv :1 50 2. 05 72 9v 1 [ cs .D S] 1 9 Fe b 20 15

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lower bounds on $q$-wise independence tails and applications to min-entropy condensers

We present novel and sharp lower bounds for higher load moments in the classical problem of mapping M balls into N bins by q-universal hashing, specialized to the case when M = N . As a corollary we prove a tight counterpart for the result about min-entropy condensers due to Dodis, Pietrzak and Wichs (CRYPTO’14), which has found important applications in key derivation. It states that condensin...

متن کامل

A Derandomization Using Min-Wise Independent Permutations

Min-wise independence is a recently introduced notion of limited independence, similar in spirit to pairwise independence. The later has proven essential for the derandomization of many algorithms. Here we show that approximate min-wise independence allows similar uses, by presenting a derandomization of the RNC algorithm for approximate set cover due to S. Rajagopalan and V. Vazirani. We also ...

متن کامل

Linear Probing with 5-wise Independence

Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms for storing (key,value) pairs. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space consuming hash functions, or on the unrealistic assumption of free...

متن کامل

Extendible Chained Bucket Hashing for Main Memory Databases

The objective of this paper is to develop a high performance hash-based access method for main memory database systems. Chained bucket hashing is known to provide the fastest random access to a static file stored in main memory. For a dynamic file, however, chained bucket hashing is inappropriate because its address space cannot be adapted to the file size without total reorganization. Extendib...

متن کامل

Fast Information-Theoretic Agglomerative Co-clustering

Our algorithm iteratively merges those clusters whose merge yields a lower objective cost. However, operations such as finding nearest neighbors or closest pair of clusters are expensive, especially in high dimensions. To quickly find highly similar clusters to be merged, we exploit the Locality-Sensitive Hashing (LSH) technique, which we briefly describe in this section. Simply put, LSH [2] is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015